## png 
##   2

Summary of ABC for IBD traits

This report shows summary statistics produced by the disease variant validation pipeline for predictions made by ABC.
The following report is done for IBD traits.


Fig1. ABC maps connecting fine-mapped variants to enhancer, genes and celltypes. a, Enrichment of fine-mapped IBD variants (PIP>10%) in ABC enhancers. b, Fraction of fine-mapped IBD variants (PIP>10%) overlapping ABC enhancers. c, Precision–recall plot of connections between noncoding IBD credible sets and known IBD-associated genes. d, Precision–recall plot of connections between noncoding IBD credible sets and known IBD-associated genes across different prediction methods.

To what extent are IBD GWAS variants enriched in predicted enhancers?

Enrichment of GWAS variants for enhancers (including promoter regions) and (excluding promoter regions)

These barplots calculates the enrichment for fine-mapped IBD variants (PIP >= 10%) in ABC enhancers (left) and enhancers without promoter regions (promoter regions as defined by RefSeq) (right) across all biosamples.

Fig2. Enrichment of fine-mapped IBD variants (PIP >= 10%) in enhancers (including promoter regions) across all biosamples.

Enrichment tables

Tables with values used to derive enrichment values and # fraction overlap for enhancers (including promoter regions) and enhancers (excluding promoter regions).

ABC

What fraction of IBD variants overlap predicted enhancers?

Variant overlap of GWAS variants for enhancers (including promoter regions) (left) and enhancers (excluding promoter regions) (right)

We plot the fraction overlap of IBD variants (PIP >= 10%) that overlap ABC enhancers and enhancers without promoter regions (promoter regions as defined by RefSeq) across all biosamples.

Fig3. Fraction overlap of fine-mapped IBD variants (PIP >= 10%) in enhancers (including promoter regions) across all biosamples. Box plots show median (middle line) and interquartile range (boxes) and whiskers show observations less than or equal to quartiles ± 1.5× the interquartile range.

How well do predictions connect noncoding credible sets to known disease genes?

These precision-recall curves seek to plot the performance of choosing the gene with the best score per locus.
Precision = fraction of identified genes corresponding to the list of known genes that affect IBD
Recall = fraction of known genes that were identified.
Fig5. Precision–recall plot of connections between noncoding IBD credible sets and known IBD-associated genes, where recall is the fraction of credible sets for which the known
          gene is identified (sensitivity) and precision is the fraction of predicted
          genes corresponding to known genes (positive predictive value). As baseline, the heuristic of assigning each GWAS credible
          set to the closest gene—a method that is widely used to annotate GWAS
          loc. Simiarly, a similar approach — which selects the closest
          transcription start site (TSS) was also added.

Fig5. Precision–recall plot of connections between noncoding IBD credible sets and known IBD-associated genes, where recall is the fraction of credible sets for which the known gene is identified (sensitivity) and precision is the fraction of predicted genes corresponding to known genes (positive predictive value). As baseline, the heuristic of assigning each GWAS credible set to the closest gene—a method that is widely used to annotate GWAS loc. Simiarly, a similar approach — which selects the closest transcription start site (TSS) was also added.

Summary enrichment plots across all predictions


Aggregate Cumulative Density and Density plots across all predictors for enrichment values in all enhancer regions (without promoters) Enrichment values are presented in two ways : a cumulative density plot (left) and barplots (right)

Fig6. Aggregate enrichment plots, cumulative density plot (left) and barplots (right) of GWAS variants in enhancers (excluding promoter regions) across different predictors. These are enrichment values calculated for IBD traits across all predictors.

Consolidating enrichment values across all traits and all predictions for enhancers (including promoter regions).

Fig7. Aggregate enrichment plots, cumulative density plot (left) and barplots (right) of GWAS variants in enhancers (including promoter regions) across different predictors. These are enrichment values calculated for IBD traits across all predictors.

Methods

GWAS traits and loci
Summary statistics for IBD, Crohn’s disease and ulcerative colitis (European ancestry only)51 from https://www.ibdgenetics.org/downloads.html. We obtained fine-mapping results and summary statistics for 71 other traits based on an unpublished analysis ( Jacob Ul, M. Kanai and Hillary Finucane, unpublished data) that analysed data from the UK Biobank. (fine-mapping data are available at https://www.finucanelab.org/data). In this analysis, up to 361,194 individuals of white British ancestry with available phenotypes and variants with INFO > 0.8, minor allele frequency > 0.01%, and Hardy–Weinberg equilibrium P > 1 × 10−10 were included in the GWAS. For all traits, except where specified, we considered only the ‘noncoding credible sets’—that is, those that did not contain any variant in a coding sequence or within 10 bp of a splice site annotated in the RefGene database (downloaded from UCSC Genome Browser on 24 June 2017).

Defining enriched biosamples for each trait
For a given trait, we intersected variants with PIP ≥ 10% in noncoding credible sets with ABC enhancers (or other genomic annotations). For each biosample, we calculated a P value using a binomial test comparing the fraction at which PIP ≥ 10% variants overlapped ABC enhancers with the fraction at which all common variants overlap ABC enhancers in that cell type. We calculated the latter using common variants in the 1000 Genomes Projects as described in the ‘Stratified linkage disequilibrium score regression’ section. For each trait, we defined a biosample as significantly enriched for that trait if the Bonferroni-corrected binomial P value was <0.001.

Comparison of enrichment of fine-mapped variants in enhancer regions and other enhancer-gene predictions
We compared the enrichment of fine-mapped variants in ABC enhancers and other enhancer definitions. . We analysed each of the previous studies from Fig 1d (summary plot) reporting cell-type specific enhancer-gene predictions. For each of the methods below, we downloaded previous predictions of enhancer–gene links, , and assessed their ability to identify IBD-associated genes. . For this analysis, we used the predictions from each method to overlap fine-mapped variants (PIP ≥ 10%) with enhancers in any cell type and assigned variants to the predicted gene(s).

Enhancer-gene correlation (ChromHMM2017)
Gene expression was previously correlated with five active chromatin marks (H3K27ac, H3K9ac, H3K4me1, H3K4me2 and DNase I hypersensitivity) across 56 biosamples, and these correlation links were then used to make predictions for the predicted enhancers (regions with the ‘7Enh’ ChromHMM state) in 127 biosamples from the Roadmap Epigenome Atlas. We downloaded these predictions from www.biolchem.ucla.edu/labs/ ernst/roadmaplinking and made predictions using the confidence score.


Enhancer-gene correlation (EpiMap)
Gene expression was previously correlated with five active chromatin marks (H3K27ac, H3K9ac, H3K4me1, H3K4me2 and H3K4me3) across 304 biosamples. A negative set of correlations for each enhancer was computed using random genes in a different chromosome. We predicted links for each biosample and ChromHMM enhancer state separately (states E7, E8, E9, E10, E11 and E15). Predictions were made by training an XGBoost classifier on the positive set of all valid links against their paired negative links, using precomputed correlations and distance to the transcription start site as features, and keeping all links with a probability above 5/7. We downloaded these predictions from https://personal.broadinstitute.org/cboix/epimap/links/.